Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech
نویسندگان
چکیده
Depression is a mental disorder that threatens the health and normal life of people. Hence, it essential to provide an effective way detect depression. However, research on depression detection mainly focuses utilizing different parallel features from audio, video, text for performance enhancement regardless making full usage inherent information speech. To focus more emotionally salient regions speech, in this research, we propose multi-head time-dimension attention-based long short-term memory (LSTM) model. We first extract frame-level store original temporal relationship speech sequence then analyze their difference between speeches those status. Then, study various use modified feature set as input LSTM layer. Instead using output traditional LSTM, attention employed obtain key time related by projecting into subspaces. The experimental results show proposed model leads improvements 2.3 10.3% over Distress Analysis Interview Corpus-Wizard Oz (DAIC-WOZ) Multi-modal Open Dataset Mental-disorder (MODMA) corpus, respectively.
منابع مشابه
Speech dereverberation using long short-term memory
Recently, neural networks have been used for not only phone recognition but also denoising and dereverberation. However, the conventional denoising deep autoencoder (DAE) based on the feed-forward structure is not capable of handling very long speech frames of reverberation. LSTM can be effectively trained to reduce the average error between the enhanced signal and the original clean signal by ...
متن کاملSensitivity from Short-Term Memory vs. Stability from Long-Term Memory in Visual Attention Method
In this paper a special focus on the relationship between sensitivity and stability in a dynamic selective visual attention method is described. In this proposal sensitivity is associated to short-term memory and stability to long-term memory, respectively. In first place, all necessary mechanisms to provide sensitivity to the system are included in order to succeed in keeping the attention in ...
متن کاملLong short-term memory networks for noise robust speech recognition
In this paper we introduce a novel hybrid model architecture for speech recognition and investigate its noise robustness on the Aurora 2 database. Our model is composed of a bidirectional Long Short-Term Memory (BLSTM) recurrent neural net exploiting long-range context information for phoneme prediction and a Dynamic Bayesian Network (DBN) for decoding. The DBN is able to learn pronunciation va...
متن کاملExtending Long Short-Term Memory for Multi-View Structured Learning
Long Short-Term Memory (LSTM) networks have been successfully applied to a number of sequence learning problems but they lack the design flexibility to model multiple view interactions, limiting their ability to exploit multi-view relationships. In this paper, we propose a Multi-View LSTM (MV-LSTM), which explicitly models the view-specific and cross-view interactions over time or structured ou...
متن کاملEndpoint Detection Using Grid Long Short-Term Memory Networks for Streaming Speech Recognition
The task of endpointing is to determine when the user has finished speaking. This is important for interactive speech applications such as voice search and Google Home. In this paper, we propose a GLDNN-based (grid long short-term memory deep neural network) endpointer model and show that it provides significant improvements over a state-of-the-art CLDNN (convolutional, long short-term memory, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Frontiers in Neurorobotics
سال: 2021
ISSN: ['1662-5218']
DOI: https://doi.org/10.3389/fnbot.2021.684037